Point Mutations Effects on Charge Transport Properties of the Tumor-Suppressor 

Gene p53 
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We report on a theoretical study of point mutations effects on charge transfer properties in the 
DNA sequence of the tumor-suppressor p53 gene. On the basis of effective single-strand or double- 
strand tight-binding models which simulate hole propagation along the DNA, a statistical analysis of 
charge transmission modulations associated with all possible point mutations is performed. We find 
that in contrast to non-cancerous mutations, mutation hotspots tend to result in significantly weaker 
changes of transmission properties. This suggests that charge transport could play a significant role 
for DNA-repairing deficiency yielding carcinogenesis. 

PACS numbers: 87.15.Aa, 87.14.Gg, 87.19.Xx 



The charge transfer properties and long range oxida- 
tion mechanisms in DNA molecules are believed to play 
a critical role in the living organismsJ^^ For instance, it 
is believed that base excision repair (BER) enzymes lo- 
cate the DNA base lesions or mismatches by probing the 
DNA-mcdiatcd charge transport {CT)^ The p53 DNA 
is said to be the "guardian of the genome" since it en- 
codes the TP53 protein that suppresses the tumor devel- 
opment by activating the DNA repair mechanisms or the 
cell apoptosis process if the damage of DNA is irrepara- 
ble. More than 50% of human cancers are related to the 
mutations of the p53 gene which jeopardize the efficient 
functioning of rP53<^ Most of the cancerous mutations 
are point mutations — a base pair substituted by another 
— with distributions along the DNA sequence that are 
highly non-uniform.— The positions where the mutations 
occur most frequently are call the "hotspots" of muta- 
tions. Each point mutation can be characterized by two 
parameters k and s, respectively representing the posi- 
tion of the mutation on the sequence and the nucleotide 
substituting the original one. From the lARC database?^ 
one finds that most hotspots of p53 are located in the cx- 
ons 5, 6, 7, and 8 in the interval from the 13055th to the 
14588th nucleotide. The distribution of the point muta- 
tions in this range is reported in Fig. [1] In this Letter, 
by using single and double strands tight-binding models 
with parameters fitted from ab initio calculations^ii, the 
charge transmission changes owing to cancerous and non- 
cancerous point mutations are statistically studied for 
the p53 gene. We find that anomalously small changes 
of charge transfer efficiency modulations coincide with 
cancerous mutations. In contrast, non-cancerous muta- 
tions result, on average, in much larger changes of the 
CT properties. From this analysis, we suggest a new sce- 
nario how cancerous mutations could shortcut the DNA 
damage/repair processes and hence yield carcinogenesis. 

A simple but physically reasonable description of co- 
herent hole transport in single strand DNA is given by 



1200 



1000 



800 



600 - 



400 - 



J. so 



jiiiiLji 

13000 



i 



il 

1 



13500 14000 
Position y 



i il 



14500 



4MJ.12 



FIG. 1: Mutation frequency of each site (thin lines) and 
averaged transmission coefficient Tj^so (dashed line). Verti- 
cal dotted lines denote known regions of frequent mutations 
(hotspots). 



an effective tight-binding Hamiltonian^ 



H 



tn,n+licicn+l + h.C.) (1) 



where each lattice point represents a nucleotide base 
(A,T,C,G) of the chain for n = 1, . . . , N. This one-leg 
(IL) model is shown schematically in Fig. HJa). In this 
tight-binding formalism, cj^ (c„) is the creation (destruc- 
tion) operator of a hole at the nth site. The tn,n+i are the 
hopping integrals along the DNA. e„ is related to the ion- 
ization potential at the nth site. The electronic energetics 
of a DNA chain should take into account three different 
contributions coming from the nucleobases system, the 
backbone system and the environment We emphasize 
that in many of the models to be used here, simplified as- 
sumptions about these energy scales have to be employed. 
Mostly, however, the ionization energies eq = 7.75eV, 
ec = 8.87eV, eA = 8.24eV and ex = 9.14eV^ are taken 
as suitable approximations for the onsite energetics at 
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FIG. 2: Schematic models for hole transport in DNA. The 
nucleobases are given as (grey) circles. Electronic pathways 
are shown as lines, and dashed lines and circles denote the 
sugar-phosphate backbone. Graph (a) shows effective models 
IL and FB (with dashed backbone) for transport along a sin- 
gle channel, whereas graph (b) depicts possible two-channel 
transport models 2L and LM (with dashed backbone). 



each base as well as 7.75eV for the clcctrodesi^'^ d^d^d^ 
Furthermore, in the IL model i„.„+i is assumed to be 
nucleotide-indepcndent with tn,Ti+i = 0.4eV following 
prior modelling in agreement with ab initio calculationSi? 

A straightforward generalization of model ^ includes 
a two- leg ladder model (2L) as shown in Fig. [5] (b). 
The hopping between hke base pairs (AT/AT, GC/GC, 
etc.) is chosen as 0.35eV, between unlike base pairs it 
is 0.17eV; the interchain hopping t± = O.leV. Other 
models^ include the presence of sites which represent the 
sugar-phosphate backbone of DNA but along which no 
electron transport is allowed (cp. Fig. (2). In the following 
we call the one-channel variety a fishbone (FB) and the 
two-channel version ladder model (LM). The additional 
hopping onto the backbone is 0.7eV and the backbone 
onsite energy is taken to be 8.5eV, roughly equal to the 
mean of all onsite energies for the base pairs. 

The most convenient method for studying the trans- 
port properties of these 4 quasi-one-dimensional tight- 
binding models is the transfer-matrix method,— which 
allows us to determine the transmission coefficient T{E) 
of hole states in systems with varying cross section M 
and length L 3> M . Briefly, we can solve for the eigen- 
states l^*) = X^n^"!*^) Hamiltonian, where |n) 

represents the state that the hole is located in the nth 
site, as (V'l, V'l-i)^ = tl ■ {i>i,i>of where tl{E) is the 
global transfer matrixi^^ E is the energy of the injected 
carrier. The transmission T{E) is given in terms of tl (E) 
by a simple analytic formula^ for the IL and FB models 
and can be computed from the localization lengths for 
the 2L and LM modelsJ^ 

Let us define S = (si, S2, • • ■ , S20303) as the sequence of 
the p53 gene (NCBI access number X54156, 20303 base 
pairs) ^ whereas Sj^l is a segment of S with length L 
starting at the jth base pair, i.e. Sj^L{n) = S{j — 1 + n) 
with n = 1,2,...,L. Next, we denote by Tjx{E) the 
transmission coefhcient corresponding to 5*^^^. We then 
characterize the energy-averaged CT for the jth site with 
segment length L as the value 7}^^ obtained by integrat- 
ing Tj^L{E) for all incident energies and all possible L 



subsequences of all p53 segments of length L containing 
the jth site such that 
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where n is further restricted to 1 < n < 20304— L close to 
the boundaries; Eq and Ei denote a suitable energy win- 
dow which we shall normally choose to equal the extrema 
of the energy spectrum for each model. In Fig. [T] we show 
Tj^so for model IL and base pair range 13000 < j < 14800 
where the most cancerous mutations occur. The posi- 
tions of four groups of hotspots, i.e. peaks of the muta- 
tion frequency, corresponding to the four cxons (5-8th) 
coincide with local minima of Ij^gp. 

If the fcth base on the p53 sequence is mutated from 
Sfc to s and j<k<j + L'-l, we will denote the mu- 
tated segment containing this mutation as S"*^'^ such that 



,,L(fc-J+l) 



s and Syl{i) = Sj^L{i) for all i ^ k—j-\-l. 
The corresponding transmission coefficients of the orig- 
inal and mutated sequence are denoted as Tj^L{E) and 
Tj'^{E), respectively. Similarly, we define the energy- 
averaged squared differences in transmission coefficient 
between original and mutated sequence as 
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The 14585th base pair of the phi sequence is a par- 
ticularly active hotspot with 133 entries in the lARC 
databasci^ It exhibits mutations from C to T and caus- 
ing various types of cancer. However, the mutations 
C G and C ^ ^ at the same position are not can- 
cerous. The effects of the cancerous C T and the 
non-cancerous C A, C ^ G mutations on the CT 
properties are shown in Fi^. [3l The transmission coeffi- 
cients Ti4575_2o(-E) and Ti^^^^'2o{E) with s = T, A and 
G are given in Fig. [31 We find that for most energies the 
mutation C ^ T results in the weakest change in T{E). 
To evaluate the change of CT for all mutations in p53 



quantitatively, A 



14585, s 
14575,20 



are computed for all the four 



models. The results are shown in Table [D We sec that 
the cancerous mutation C — > T shows the smallest rela- 
tive change in CT for nearly all models. The only differ- 
ences occur for small L = 20 in models FB and LM but 
vanish quickly for larger L. Hence for a damage-repair 
process which uses a CT-based criterion as a detection 
mechanism, this mutation will be the hardest to identify. 
These results seem to suggest a scenario in which cer- 
tain mutations might avoid the CT-driven DNA damage- 
repair mechanism and survive to develop cancerous tu- 
mors. We have checked that this trend is independent of 
the specific model and hotspot chosen by analysing also 
the hotspots 13117, 13203, 13334, 13419, 14060, 14069, 
14070, 14074, 14076, 14486, 14487, 14501, 14513, and 
17602 of the lARC TP53 data base^ for DNA segment 
lengths L = 10, 20, . . . 160. We find that the number of 
cases in which a cancerous mutation corresponds to a 
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FIG. 3: Energy-dependence of logarithmic transmission coef- 
ficients Ti4|7|;2o(£^) of the original sequence (C shaded solid 
line) and mutated (^4 dotted, G dotted-dashed, T dashed) 
sequences with length L = 20 (from 14575th to 14594th nu- 
cleotide) of p53. The left panel shows results for model IL, 
the right two panels denote the two transport windows for the 
fishbone model.— 



TABLE I: Renormalized values of the energy-averaged 
changes AJ4575'2o in transmission properties for the 4 tight- 
binding models. All data are shown with at most 3 significant 
figures. Common multiplication factors for each group of data 
for given L and mutations with C — > A, G and T are sup- 
pressed. Bold entries denote minima for the CT change. 
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IL 


FB 


2L 


LM 


A 


20 


23.1 


8.46 


2.24 


0.43 


C ~*G 


20 


37.6 


0.73 


0.83 


0.57 


C 


20 


5.63 


1.08 


0.34 


0.66 


C -> A 


30 


15.7 


54.8 


96.2 


1.76 


C^G 


30 


21.4 


0.55 


2.75 


0.40 


C 


30 


9.14 


0.0006 


0.39 


0.15 


C ^ A 


40 


1.16 


30.7 


31.6 


17.7 


C^G 


40 


2.21 


0.72 


0.41 


0.16 


C 


40 


0.40 


0.009 


0.26 


0.04 



segment of low transmission change is within 5%-15% 
the same for models LI, FB, L2 and LM, with results 
for LI and FB very similar to each other. The models 
L2 and LM are within 15% of each other and have only 
a slightly smaller occurrence of these cases of low trans- 
mission change and high canccrousncss than LI and FB. 
Thus in the following, we shall restrict our analysis to 
the simple case of the strictly ID model IL given by ([1]). 

Experimentally, the BER enzymes can locate the dam- 
aged sites at a distance of 19 base pairs on the DNA 
strand by probing the CT of the segment bound by the 
enzymesi^ If a mutation changes the CT only slightly, the 
enzymes might thus not be able to find it and the repair 
mechanism will not be activated. On the other hand, the 
fact that mutations C ^ G and C ^ A are not found 
in cancer cells does not mean that these mutations do 
not occur. Rather, the changes in CT induced by them 
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FIG. 4: Scatter plots of r(A;, s; w) versus occurrence frequency 
of all cancerous mutations s corresponding to the hotspots {k} 
for (a) L = 20 and (b) 80. 



are more significant which could allow an easier detection 
by CT-probing enzymes. Accordingly these two types of 
mutations will be repaired and cancer will not develop. 

In order to challenge such a scenario, the change of 
CT for all 20303 x 3 = 60909 types of possible point mu- 
tations are examined. The average effect of a mutation 
(fc, s) of a subsequence with length L on the CT of p53 
is defined as 



r(fc,s;L) 



k 

E ^ 



(4) 



where j also satisfies 1 < j < 20304 — L close to bound- 
aries. Fig. |4] shows the scatter plots of T{k, s; L) versus 
frequency of all cancerous mutations for (a) L = 20 and 
(b) 80. The sharp peaks at small T agree with the sce- 
nario that the most cancerous mutations — namely those 
with high frequency — change the CT only slightly and 
thus have smaller F. 

Let us now compare the CT change (i) for the set Ai of 
all 60909 possible point mutations of p53 (ii) for the set 
Aic of the 1953 cancerous point mutations in the lARC 
databases and (iii) for the set A^c,io of the 366 mutations 
which arc found more than 10 times in the cancer tissues. 
For given L, wc sort the CT results for A4 according to 
the computed magnitude of T{k, s; L) and determine the 
rank r{k,s;L) e [1,60909] of the CT change for each 
mutation (fc,s). A smaller rank means less CT change 
for the mutation. 7(fc, s; L) = 100% x r{k, s; L)/60909 is 
then the relative rank in percentage. 

The histograms of the distribution of ^{k, s; L) are 
shown in Fig. [5] (a) and (b) for the mutations {k, s) of 
Aic and A^c.io- The vertical axis is the percentage of 
mutations in AA^ (grey bars) and A^c.io (black bars) 
whose 7(fc, s; L) belong to the corresponding bin range 
with each width set to 5%. For A4, the result is the 
dashed line at a value of 5%. The distributions for Adc 
and A^c.io are clearly biased to smaller values of 7, es- 
pecially for the L = 80 case. E.g. there are about 9% for 
L = 20 and 27% of mutations for L = 80 in the A^c 10 
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FIG. 5: Histogram of the distribution of 'y{k,s;L) in Mc 
(light wide bars) and Mc,io (dark thin bars) which changes 
the fcth nucleotide to s for (a) L = 20 and (b) 80. For A^, all 
values are equal to 5% in the 20 intervals as indicated by the 
horizontal dashed lines, (c) shows the percentage of V{k, s; L) 
values in Alc.io for small CT change as a function of DNA 
lengths in the range 0-5% (black), 5-10% (dark grey), 10- 
15% (light grey) and 15-20% (white). Similarly, (d) indicates 
large CT change for A^c,io in the ranges 80-85% (black), 85- 
90% (dark grey), 90-95% (light grey) and 95-100% (white). 
The horizontal dashed lines in (c) and (d) indicates the dis- 
tributions for A4. 

set whose j{k, s; L) values arc smaller than 5%. This in- 
dicates that the cancerous mutations in Mc and A^c,io 
result in smaller CT changes than non-cancerous ones. 
The distribution bias is more apparent in 7V(c,io than 
that in Aic in agreement with the choice of mutations. 

Let us also evaluate the dependence of the CT change 
on different L. Figs. [5] (c) and (d) show the accumulated 
percentage of mutations in A^c.io whose 7(fc, s; L) values 
are smaller than 20% and larger than 80%, respectively. 



We see that around L = 90, more than 50% of mutations 
in 7V(j,,io change the CT less than 20%, and the number 
of cancerous mutations with an 80% or more change in 
CT is much less than average for all L. 

In summary, we find that (i) the conductance of 
hotspots of cancerous mutations is smaller than that of 
other sites, (ii) on average the cancerous mutations of 
the gene yield smaller changes of the CT in contrast 
with non-cancerous mutations, (iii) the tendency in (ii) 
is stronger in the set of highly cancerous mutations with 
occurence frequency > 10. These results suggest a pos- 
sible scenario of how cancerous mutations might circum- 
vent the DNA damage-repair mechanism and survive to 
yield carcinogenesis. However, our analysis is only valid 
in a statistical sense and we do observe occasional non- 
cancerous mutations with weak change of CT. For these, 
other DNA repair processes should exist and we there- 
fore do not intend to claim that the DNA-damage repair 
solely uses a CT-based criterion. Still, our results exhibit 
an intriguing and new correlation between the electronic 
structure of DNA hotspots and the DNA damage-repair 
process. 

Further studies should investigate how robust our con- 
clusions are with regards to electron-phonon coupling 
effects, electronic correlations, or metal/DNA contact 
interactionsi^ii^iiiii^ii^i^i2ii2^i2^i2ii^. Since mesoscopic 
transport measurements of DNA sequences of several 
tens of base pairs have been demonstrated^^, our theo- 
retical results could be challenged by investigating charge 
transfer in wild and mutated short synthesized sequences 
of the p53 gene. 
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